14 research outputs found

    Enabling Factor Analysis on Thousand-Subject Neuroimaging Datasets

    Full text link
    The scale of functional magnetic resonance image data is rapidly increasing as large multi-subject datasets are becoming widely available and high-resolution scanners are adopted. The inherent low-dimensionality of the information in this data has led neuroscientists to consider factor analysis methods to extract and analyze the underlying brain activity. In this work, we consider two recent multi-subject factor analysis methods: the Shared Response Model and Hierarchical Topographic Factor Analysis. We perform analytical, algorithmic, and code optimization to enable multi-node parallel implementations to scale. Single-node improvements result in 99x and 1812x speedups on these two methods, and enables the processing of larger datasets. Our distributed implementations show strong scaling of 3.3x and 5.5x respectively with 20 nodes on real datasets. We also demonstrate weak scaling on a synthetic dataset with 1024 subjects, on up to 1024 nodes and 32,768 cores

    The LDBC Graphalytics Benchmark

    Full text link
    In this document, we describe LDBC Graphalytics, an industrial-grade benchmark for graph analysis platforms. The main goal of Graphalytics is to enable the fair and objective comparison of graph analysis platforms. Due to the diversity of bottlenecks and performance issues such platforms need to address, Graphalytics consists of a set of selected deterministic algorithms for full-graph analysis, standard graph datasets, synthetic dataset generators, and reference output for validation purposes. Its test harness produces deep metrics that quantify multiple kinds of systems scalability, weak and strong, and robustness, such as failures and performance variability. The benchmark also balances comprehensiveness with runtime necessary to obtain the deep metrics. The benchmark comes with open-source software for generating performance data, for validating algorithm results, for monitoring and sharing performance data, and for obtaining the final benchmark result as a standard performance report

    The BTWorld Use Case for Big Data Analytics: Description, MapReduce Logical Workflow, and Empirical Evaluation

    No full text
    Abstract—The commoditization of big data analytics, that is, the deployment, tuning, and future development of big data processing platforms such as MapReduce, relies on a thorough understanding of relevant use cases and workloads. In this work we propose BTWorld, a use case for time-based big data analytics that is representative for processing data collected periodically from a global-scale distributed system. BTWorld enables a datadriven approach to understanding the evolution of BitTorrent, a global file-sharing network that has over 100 million users and accounts for a third of today’s upstream traffic. We describe for this use case the analyst questions and the structure of a multi-terabyte data set. We design a MapReduce-based logical workflow, which includes three levels of data dependency— inter-query, inter-job, and intra-job—and a query diversity that make the BTWorld use case challenging for today’s big data processing tools; the workflow can be instantiated in various ways in the MapReduce stack. Last, we instantiate this complex workflow using Pig–Hadoop–HDFS and evaluate the use case empirically. Our MapReduce use case has challenging features: small (kilobytes) to large (250 MB) data sizes per observed item, excellent (10 −6) and very poor (10 2) selectivity, and short (seconds) to long (hours) job duration. I

    Reliability, Validity, and Sensitivity to Change of the Cochin Hand Functional Disability Scale and Testing the New 6-Item Cochin Hand Functional Disability Scale in Systemic Sclerosis

    Full text link
    Systemic sclerosis (SSc) is a chronic autoimmune disease causing complex hand disability. A reliable tool for hand function assessment in SSc is the Cochin Hand Functional Disability Scale (CHFS). More recently, a short-form CHFS of 6 items (CHFS-6) has been developed. OBJECTIVES To validate the CHFS and the new CHFS-6 in Romanian patients with SSc. PATIENTS AND METHODS Consecutive patients with SSc who completed the CHFS were included. All patients were assessed according to the recommendations of the European Scleroderma and Research Trials and also completed the Scleroderma Health Assessment Questionnaire and the Hand Mobility in Scleroderma questionnaire. Finger range-of-motion distances were measured. RESULTS Seventy patients, 63 female and 7 male patients (age median, 53.0 years; interquartile range [IQR], 21.0 years), were included. Twenty seven had diffuse cutaneous involvement (dcSSc). Median CHFS and CHFS-6 at baseline were 25.0 (IQR, 37.0) and 8.0 (IQR, 13.0), respectively.The internal consistency (Cronbach α = 0.96, respectively, 0.90, in all 70 patients) and test-retest reliability (intraclass correlation coefficient = 0.98 for both, in 38 patients) of both CHFS and CHFS-6 were excellent. The CHFS-6 had a very high correlation with the CHFS. There were moderate to good correlations with Hand Mobility in Scleroderma, Scleroderma Health Assessment Questionnaire, and the anthropometric measurements (construct validity). In patients with early dcSSc with a second evaluation, we found good to moderate sensitivity to change (standardized response mean of 0.8 and effect size of 0.4 for CHFS, and standardized response mean of 1.1 and effect size of 0.6 for CHFS-6). CONCLUSIONS The CHFS and CHFS-6 are valid and easy-to-use tools for hand involvement in SSc, which can be used in clinical or research setting

    End-to-end programmable computing systems

    No full text
    Abstract Recent technological advances have contributed to the rapid increase in algorithmic complexity of applications, ranging from signal processing to autonomous systems. To control this complexity and endow heterogeneous computing systems with autonomous programming and optimization capabilities, we propose a unified, end-to-end, programmable graph representation learning (PGL) framework that mines the complexity of high-level programs down to low-level virtual machine intermediate representation, extracts specific computational patterns, and predicts which code segments run best on a core in heterogeneous hardware. PGL extracts multifractal features from code graphs and exploits graph representation learning strategies for automatic parallelization and correct assignment to heterogeneous processors. The comprehensive evaluation of PGL on existing and emerging complex software demonstrates a 6.42x and 2.02x speedup compared to thread-based execution and state-of-the-art techniques, respectively. Our PGL framework leads to higher processing efficiency, which is crucial for future AI and high-performance computing applications such as autonomous vehicles and machine vision
    corecore